Overview

The following report details the methods used to determine appropriate filter thresholds for SNV variant calls.


Creating simulated data

On the site level, three major filters were applied to obtain high-quality variants: variant quality normalized by read depth (QD), strand odds ratio (SOR) and Fisherstrand (FS). To find the optimal filter thresholds, the following steps were taken. Note the filter thresholds were only optimized for SNPs.





Optimize the QD filter

To reduce complexity, the filter thresholds were optimized one at a time. When optimizing the QD filter for example, no other filters were further applied.

The optimal QD threshold were determined as follows:


CHROM POS QD sim1_genotype sim2_genotype sim3_genotype ==> pass_QD_filter is_detected
I 1352 110 1/1 1/1 1/1 QD threshold is 10 yes yes
I 2566 90 1/1 0/0 1/1 QD threshold is 10 yes yes
I 3847 2 0/0 1/1 0/0 QD threshold is 10 no no
I 4975 38 1/1 0/0 0/0 QD threshold is 10 no no
I 5590 298 1/1 1/1 1/1 QD threshold is 10 yes yes


CHROM POS is_detected is_in_truth category
I 1352 yes yes true positive
I 2566 yes no false positve
I 3847 no no true negative
I 4975 no yes false negative
I 5590 yes yes true positive


in_truth not_in_truth
detected count of true positive count of false positive
not_detected count of false negative count of true negative



Optimize the SOR filter


Optimize the FS filter


QD, SOR, FS filter in combination